LSSL-SSD: Social Spammer Detection with Laplacian Score and Semi-supervised Learning
نویسندگان
چکیده
The rapid development of social networks makes it easy for people to communicate online. However, social networks always suffer from social spammers due to their openness. Spammers deliver information for economic purposes, and they pose threats to the security of social networks. To maintain the long-term running of online social networks, many detection methods are proposed. But current methods normally use high dimension features with supervised learning algorithms to find spammers, resulting in low detection performance. To solve this problem, in this paper, we first apply the Laplacian score method, which is an unsupervised feature selection method, to obtain useful features. Based on the selected features, the semi-supervised ensemble learning is then used to train the detection model. Experimental results on the Twitter dataset show the efficiency of our approach after feature selection. Moreover, the proposed method remains high detection performance in the face of limited labeled data.
منابع مشابه
Graph Laplacian for Semi-supervised Feature Selection in Regression Problems
Feature selection is fundamental in many data mining or machine learning applications. Most of the algorithms proposed for this task make the assumption that the data are either supervised or unsupervised, while in practice supervised and unsupervised samples are often simultaneously available. Semi-supervised feature selection is thus needed, and has been studied quite intensively these past f...
متن کاملMEFUASN: A Helpful Method to Extract Features using Analyzing Social Network for Fraud Detection
Fraud detection is one of the ways to cope with damages associated with fraudulent activities that have become common due to the rapid development of the Internet and electronic business. There is a need to propose methods to detect fraud accurately and fast. To achieve to accuracy, fraud detection methods need to consider both kind of features, features based on user level and features based o...
متن کاملLeveraging Careful Microblog Users for Spammer Detection
Microblogging websites, e.g. Twitter and Sina Weibo, have become a popular platform for socializing and sharing information in recent years. Spammers have also discovered this new opportunity to unfairly overpower normal users with unsolicited content, namely social spams. While it is intuitive for everyone to follow legitimate users, recent studies show that both legitimate users and spammers ...
متن کاملGeneralized Optimization Framework for Graph-based Semi-supervised Learning
We develop a generalized optimization framework for graphbased semi-supervised learning. The framework gives as particular cases the Standard Laplacian, Normalized Laplacian and PageRank based methods. We have also provided new probabilistic interpretation based on random walks and characterized the limiting behaviour of the methods. The random walk based interpretation allows us to explain dif...
متن کاملLocality sensitive semi-supervised feature selection
In many computer vision tasks like face recognition and image retrieval, one is often confronted with high-dimensional data. Procedures that are analytically or computationally manageable in low-dimensional spaces can become completely impractical in a space of several hundreds or thousands dimensions. Thus, various techniques have been developed for reducing the dimensionality of the feature s...
متن کامل